A Topic Segmentation of Texts based on Semantic Domains
نویسندگان
چکیده
1 LIMSI-CNRS. BP 133, 91403 Orsay Cedex, France. email: [ferret,grau]@limsi.fr Abstract. Thematic analysis is essential for many Natural Language Processing (NLP) applications, such as text summarization or information extraction. It is a two-dimensional process that has both to delimit the thematic segments of a text and to identify the topic of each of them. The system we present possesses these two characteristics. Based on the use of semantic domains, it is able to structure narrative texts into adjacent thematic segments, this segmentation operating at the paragraph level, and to identify the topic they are about. Moreover, semantic domains, that are topic representations made of words, are automatically learned, which allows us to apply our system on a wide range of texts in varied domains.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملArabic Document Topic Analysis
Abstract We adopt algorithms for document topic analysis, consisting of segmentation and topic identification, to Arabic. By doing so, we outline the requirements for Arabic language resources that facilitate building, training, and fine-tuning systems that perform these tasks. Our segmentation and topic identification algorithm is based on Probabilistic Latent Semantic Analysis. First results ...
متن کاملTopic Segmentation for Short Texts
Topic segmentation, which aims to fmd the boundaries between topic blocks in a text, is an important task for semantic analysis of texts. Although different solutions have been proposed for the task, many limitations and difficulties exist in the approaches. In particular most of the methods do not work well for such case as short texts, internet news and student's writings. In this paper, we f...
متن کاملThe evolution of the meaning of the word nurse based on the classical texts of Persian literature
Background and Aim: The semantic evolution of a word over time is inevitable, indicating a social, political, religious or cultural process. Nurse is one of the words that has a significant presence in Persian literature texts and has been used in many different meanings such as slave, servan, maid, devotee, obedient, patient and preserver. The purpose of this study is to show its semantic ev...
متن کاملA Hybrid Algorithm based on Deep Learning and Restricted Boltzmann Machine for Car Semantic Segmentation from Unmanned Aerial Vehicles (UAVs)-based Thermal Infrared Images
Nowadays, ground vehicle monitoring (GVM) is one of the areas of application in the intelligent traffic control system using image processing methods. In this context, the use of unmanned aerial vehicles based on thermal infrared (UAV-TIR) images is one of the optimal options for GVM due to the suitable spatial resolution, cost-effective and low volume of images. The methods that have been prop...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000